Efficient Non-linear Changed Mel-filter Bank VAD Algorithm
نویسندگان
چکیده
This paper introduces efficient non-linear changed mel-filter bank (MFB) voice activity detection (VAD) algorithm. Non-linear changed mel-filter bank outputs improve detection of parts in the speech signal, where vowels, diphthongs and semivowels are present. To make voice activity detection of consonants in the speech signal as good as possible, the hangover and hangbefore criteria are used. For this reason the phoneme duration analysis was made. The duration of vowels, diphthongs and semivowels defines how many frames must be detected as speech, so that it can be decided if hangover and hangbefore criteria will be used at all. The duration of consonants defines how many frames will be used for hangover and hangbefore criteria. Comparative tests were made between the MFB VAD algorithm, where non-linear function was used and where it was not used. The experiments were also made on four VAD algorithms used in the ITU G.729, ITU G.723.1, DSR ETSI ES 202 050, and DSR ETSI ES 202 211 standards. The introduction of non-linear function in to the MFB VAD algorithm reduces errors obtained by incorrect voice activity detection. Key-Words: voice activity detection, signal processing, hangover criterion, hangbefore criterion
منابع مشابه
A Computationally Efficient Mel-Filter Bank VAD Algorithm for Distributed Speech Recognition Systems
This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective ...
متن کاملImproved voice activity detection algorithm using wavelet and support vector machine
This paper proposes an improved voice activity detection (VAD) algorithm using wavelet and support vector machine (SVM) for European Telecommunication Standards Institution (ETSI) adaptive multi-rate (AMR) narrow-band (NB) and wide-band (WB) speech codecs. First, based on the wavelet transform, the original IIR filter bank and pitch/tone detector are implemented, respectively, via the wavelet f...
متن کاملA comparative study of performance of fpga based mel filter bank & bark filter bank
The sensitivity of human ear is dependent on frequency which is nonlinearly resolved across the audio spectrum .Now to improve the recognition performance in a similar non linear approach requires a front end design, suggested by empirical evidences. A popular alternative to linear prediction based analysis is therefore filter bank analysis since this provides a much more straightforward route ...
متن کاملAn Efficient Algorithm to Design Nearly Perfect-Reconstruction Two-Channel Quadrature Mirror Filter Banks
In this paper, a novel technique for the design of two-channel Quadrature Mirror Filter (QMF) banks with linear phase in frequency domain is presented. To satisfy the exact reconstruction condition of the filter bank, low-pass prototype filter response in pass-band, transition band and stop band is optimized using unconstrained indirect update optimization method. The objective function is form...
متن کاملImproving the filter bank of a classic speech feature extraction algorithm
The most popular speech feature extractor used in automatic speech recognition (ASR) systems today is the mel frequency cepstral coefficient (mfcc) algorithm. Introduced in 1980, the filter bank-based algorithm eventually replaced linear prediction cepstral coefficients (lpcc) as the premier front end, primarily because of mfcc’s superior robustness to additive noise. However, mfcc does not app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012